Show code
pacman::p_load(sf, tidyverse, sfdep, spdep, tmap, lubridate, ggplot2, Kendall)March 3, 2024
Dengue Hemorrhagic Fever (in short dengue fever) is one of the most widespread mosquito-borne diseases in the most tropical and subtropical regions. It is an acute disease caused by dengue virus infection which is transmitted by female Aedes aegypti and Aedes albopictus mosquitoes. In 2015, Taiwan had recorded the most severe dengue fever outbreak with more than 43,000 dengue cases and 228 deaths. Since then, the annual reported dengue fever cases were maintained at the level of not more than 200 cases. However, in 2023, Taiwan recorded 26703 dengue fever cases. More than 25,000 cases were reported at Tainan City, and more than 80% of the reported dengue fever cases occurred in the month August-November 2023 and epidemiology week 31-50.
We will be exploring the following:
if the distribution of dengue fever outbreak at Tainan City, Taiwan are independent from space and space and time.
If the outbreak is indeed spatial and spatio-temporal dependent, and if so discover where are the clusters and outliers, and the emerging hot spot/cold spot areas.
The R packages that we will be using in this exercise are as follows:
sf: for importing and handling geospatial data in R
tidyverse: a collection of packages for data science tasks
sfdep: Used to compute spatial weights, global and local spatial autocorrelation statistics, and emerging hotspot analysis (EHSA)
spdep: Similar to sfdep.
tmap: Provides functions for plotting cartographic quality static point patterns maps or interactive maps by using leaflet API.
lubridate: For manipulating datetime variables.
ggplot2: Allows for simple visualisations like bar charts and line plots.
Kendall: Provides the tool to perform Mann-Kendall test
| Name | Description | File Type |
|---|---|---|
| TAINAN_VILLAGE | Geospatial data of village boundary of Taiwan | ESRI Shapefile |
| Dengue_Daily.csv | Aspatial data of reported dengue cases in Taiwan since 1998. | CSV Format |
The datasets that we will be using are as follow:
We will read the shape file in as a sf data frame using st_read(), transform it using st_transform() to the right CRS which is 3829, then confine the scope to counties specified in the assignment brief using filter() on ‘TOWNID’.
Reading layer `TAINAN_VILLAGE' from data source
`/Users/jacksontan/Documents/Sashimii0219/IS415-GAA/Take-home_Ex/Take-home_Ex02/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 649 features and 10 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0269 ymin: 22.88751 xmax: 120.6563 ymax: 23.41374
Geodetic CRS: TWD97
Importing Tainan Polygons
Reading layer `TAINAN_VILLAGE' from data source
`/Users/jacksontan/Documents/Sashimii0219/IS415-GAA/Take-home_Ex/Take-home_Ex02/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 649 features and 10 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0269 ymin: 22.88751 xmax: 120.6563 ymax: 23.41374
Geodetic CRS: TWD97
Rows: 258
Columns: 11
$ VILLCODE <chr> "67000350032", "67000270011", "67000370005", "67000330004",…
$ COUNTYNAME <chr> "臺南市", "臺南市", "臺南市", "臺南市", "臺南市", "臺南市",…
$ TOWNNAME <chr> "安南區", "仁德區", "中西區", "南區", "安南區", "安南區", "…
$ VILLNAME <chr> "青草里", "保安里", "赤嵌里", "大成里", "城北里", "城南里",…
$ VILLENG <chr> "Qingcao Vil.", "Bao'an Vil.", "Chihkan Vil.", "Dacheng Vil…
$ COUNTYID <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D",…
$ COUNTYCODE <chr> "67000", "67000", "67000", "67000", "67000", "67000", "6700…
$ TOWNID <chr> "D06", "D32", "D08", "D02", "D06", "D06", "D08", "D06", "D0…
$ TOWNCODE <chr> "67000350", "67000270", "67000370", "67000330", "67000350",…
$ NOTE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ geometry <POLYGON [m]> POLYGON ((203866.5 2555960,..., POLYGON ((215123.1 …
First look at the dataset reveals that there are several columns that are either in Traditional Chinese or contain IDs that we do not need. In that case, let’s drop those columns and the last 4 letters (spacebar + Vil.) of VILLENG column:
In the case of Chinese Pinyin, different variations of similar sounding Chinese words can end up the same when translated to English.
Simple feature collection with 9 features and 3 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 211783.7 ymin: 2538314 xmax: 216840.6 ymax: 2554862
Projected CRS: Hu Tzu Shan 1950 / UTM zone 51N
VILLCODE VILLNAME village geometry
111 67000350046 溪東里 Xidong POLYGON ((212090.2 2549717,...
112 67000350038 溪北里 Xibei POLYGON ((213637.7 2549689,...
128 67000270012 成功里 Chenggong POLYGON ((214629 2542198, 2...
137 67000320006 仁和里 Renhe POLYGON ((215772.9 2543004,...
146 67000310019 復興里 Fuxing POLYGON ((216584.4 2546984,...
214 67000340008 仁愛里 Ren'ai POLYGON ((214700 2547025, 2...
245 67000310024 成功里 Chenggong POLYGON ((216133 2546922, 2...
246 67000310025 中興里 Zhongxing POLYGON ((216431.3 2547398,...
253 67000350003 塭南里 Wennan POLYGON ((214620.8 2554862,...
We can see that there are 9 such instances. Let’s take a closer look at the first entry.
Simple feature collection with 2 features and 3 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 209825.5 ymin: 2539524 xmax: 212761.9 ymax: 2549719
Projected CRS: Hu Tzu Shan 1950 / UTM zone 51N
VILLCODE VILLNAME village geometry
82 67000330014 喜東里 Xidong POLYGON ((212211 2540508, 2...
111 67000350046 溪東里 Xidong POLYGON ((212090.2 2549717,...
The first one is called Xi3 Dong, while the second one is Xi1 Dong. We will manually change them by adding their village ID in brackets to one of each duplicates.
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: Hu Tzu Shan 1950 / UTM zone 51N
[1] VILLCODE VILLNAME village geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 258 features and 3 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 198147.4 ymin: 2534651 xmax: 221655.9 ymax: 2556729
Projected CRS: Hu Tzu Shan 1950 / UTM zone 51N
First 10 features:
VILLCODE VILLNAME village geometry
1 67000350032 青草里 Qingcao POLYGON ((203866.5 2555960,...
2 67000270011 保安里 Bao'an POLYGON ((215123.1 2539291,...
3 67000370005 赤嵌里 Chihkan POLYGON ((212263.9 2546464,...
4 67000330004 大成里 Dacheng POLYGON ((211948.4 2544453,...
5 67000350028 城北里 Chengbei POLYGON ((205015.6 2553859,...
6 67000350030 城南里 Chengnan POLYGON ((204553 2554303, 2...
7 67000370009 法華里 Fahua POLYGON ((213068.1 2544770,...
8 67000350017 海南里 Hainan POLYGON ((209616.5 2549009,...
9 67000350049 國安里 Guo'an POLYGON ((210817.8 2549594,...
10 67000350018 溪心里 Xixin POLYGON ((210566.2 2553279,...
No more duplicates!. Next we will check for invalid geometries and missing values.
[1] 0
character(0)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: Hu Tzu Shan 1950 / UTM zone 51N
[1] VILLCODE VILLNAME village geometry
<0 rows> (or 0-length row.names)
There are no invalid geometries or missing values, great! Now let’s do a simple visualization to check if we extracted the right data.
# A tibble: 106,861 × 26
發病日 個案研判日 通報日 性別 年齡層 居住縣市 居住鄉鎮 居住村里
<date> <chr> <date> <chr> <chr> <chr> <chr> <chr>
1 1998-01-02 None 1998-01-07 男 40-44 屏東縣 屏東市 None
2 1998-01-03 None 1998-01-14 男 30-34 屏東縣 東港鎮 None
3 1998-01-13 None 1998-02-18 男 55-59 宜蘭縣 宜蘭市 None
4 1998-01-15 None 1998-01-23 男 35-39 高雄市 苓雅區 None
5 1998-01-20 None 1998-02-04 男 55-59 宜蘭縣 五結鄉 None
6 1998-01-22 None 1998-02-19 男 20-24 桃園市 蘆竹區 None
7 1998-01-23 None 1998-02-02 男 40-44 新北市 新店區 None
8 1998-01-26 None 1998-02-19 女 65-69 台北市 北投區 None
9 1998-02-11 None 1998-02-13 女 25-29 台南市 南區 None
10 1998-02-16 None 1998-02-24 男 20-24 高雄市 楠梓區 None
# ℹ 106,851 more rows
# ℹ 18 more variables: 最小統計區 <chr>, 最小統計區中心點X <chr>,
# 最小統計區中心點Y <chr>, 一級統計區 <chr>, 二級統計區 <chr>,
# 感染縣市 <chr>, 感染鄉鎮 <chr>, 感染村里 <chr>, 是否境外移入 <chr>,
# 感染國家 <chr>, 確定病例數 <dbl>, 居住村里代碼 <chr>, 感染村里代碼 <chr>,
# 血清型 <chr>, 內政部居住縣市代碼 <chr>, 內政部居住鄉鎮代碼 <chr>,
# 內政部感染縣市代碼 <chr>, 內政部感染鄉鎮代碼 <chr>
Similar to the previous dataset, a significant portion of the data are in Traditional Chinese. Fortunately, the only columns we need are 發病日, 最小統計區中心點X, and 最小統計區中心點Y, which translates to Onset Date, X coordinates and Y coordinates. Let’s drop the columns that we don’t need and translate the column names to English.
From the date variable, we will also need to extract the epidemiology week and year from the onset_date variable for analysis, then confine it to epidemiology week 31-50, 2023.
Resetting the levels in the factors.
In order to do spatial analysis, we will need to convert the data to sf dataframe using st_as_sf().
Next, in order to analyse these data by their respective village, we will use st_intersection on dengue_sf and tainan_clean to add their respective village to each data point. We will also drop both the onset_date and year variable as we will not need them anymore.
Simple feature collection with 18816 features and 6 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 201081.8 ymin: 2535316 xmax: 220420.6 ymax: 2555629
Projected CRS: Hu Tzu Shan 1950 / UTM zone 51N
# A tibble: 18,816 × 7
onset_date year epiweek VILLCODE VILLNAME village geometry
* <date> <fct> <fct> <chr> <chr> <chr> <POINT [m]>
1 2023-09-14 2023 37 67000350… 青草里 Qingcao (203370.3 2555156)
2 2023-10-12 2023 41 67000350… 青草里 Qingcao (203370.3 2555156)
3 2023-08-04 2023 31 67000270… 保安里 Bao'an (216211.6 2537686)
4 2023-09-10 2023 37 67000270… 保安里 Bao'an (217062.4 2537902)
5 2023-09-11 2023 37 67000270… 保安里 Bao'an (215275.6 2537869)
6 2023-09-16 2023 37 67000270… 保安里 Bao'an (215831.1 2537495)
7 2023-09-19 2023 38 67000270… 保安里 Bao'an (215377 2538568)
8 2023-09-20 2023 38 67000270… 保安里 Bao'an (215604.8 2538239)
9 2023-09-27 2023 39 67000270… 保安里 Bao'an (215831.1 2537495)
10 2023-09-29 2023 39 67000270… 保安里 Bao'an (216211.6 2537686)
# ℹ 18,806 more rows
Again, we will check for invalid geometries and missing values.
[1] 0
character(0)
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: Hu Tzu Shan 1950 / UTM zone 51N
# A tibble: 0 × 3
# ℹ 3 variables: village <chr>, epiweek <fct>, geometry <GEOMETRY [m]>
We will count the total number of cases for each village using the following, then appending it back to the village polygon geometry.
We will now do the same but with village and epiweek for each group, in order for us to create a spacetime cube.
# A tibble: 5,160 × 2
village epiweek
<chr> <fct>
1 Andong 31
2 Andong 32
3 Andong 33
4 Andong 34
5 Andong 35
6 Andong 36
7 Andong 37
8 Andong 38
9 Andong 39
10 Andong 40
# ℹ 5,150 more rows
We will be creating our spacetime cube for subsequent emerging hot spot analysis.
Ensuring that our spacetime object is TRUE.
Before we continue, we will first plot the data points together with their respective village to have a quick overview of our spatial data on hand.